Housing Data: Taking a Look at all the Variables and the Counties In the Convex Hull¶

What is a Convex Hull?

A convex hull is the smallest convex polygon that completely encloses a set of points in a two-dimensional space or a convex polyhedron in a three-dimensional space. The convex hull is formed by stretching a rubber band around a set of points and then releasing it to snap tightly around them.

The concept of convex hulls has applications in various fields such as computer science, computational geometry, operations research, and data analysis. It is often used in algorithms to solve problems such as finding the shortest path between two points, identifying the optimal location of a facility, and identifying outliers in a dataset.

What do the Interior Points from a Convex Hull Tell us?

The interior points of a convex hull tell us about the relationship between the points and the hull itself. Specifically, interior points indicate that the points within the convex hull are tightly clustered and do not have significant gaps or variations in their arrangement. The tightly clustered points suggest that the points are similar in some way, such as having similar characteristics or being part of a similar category or group.

In contrast, if there are many points outside the convex hull, this indicates that the points are more widely dispersed and may have more significant differences or variations among them.

In [1]:
import numpy as np #linear algebra
import pandas as pd #data manipulation and analysis
import matplotlib.pyplot as plt #data visualization
import seaborn as sns #data visualization
from scipy.spatial import ConvexHull, convex_hull_plot_2d
import matplotlib.path as mpath
import plotly.express as px
import plotly.graph_objects as go
import textwrap
from urllib.request import urlopen
import json
In [2]:
# Read-in dataframe with cluster ids from KMeans Clustering Analysis
cluster2_df = pd.read_csv("housingClusterData.csv", index_col=0)
cluster2_df.columns = [col.replace(',', '') for col in cluster2_df.columns]
cluster2_df.head()
Out[2]:
Housing units July 1 2021 (V2021) Owner-occupied housing unit rate 2017-2021 Median value of owner-occupied housing units 2017-2021 Median selected monthly owner costs -with a mortgage 2017-2021 Median selected monthly owner costs -without a mortgage 2017-2021 Median gross rent 2017-2021 Building permits 2021 Households 2017-2021 Persons per household 2017-2021 Living in same house 1 year ago of persons age 1 year+ 2017-2021 Language other than English spoken at home of persons age 5 years+ 2017-2021 Households with a computer 2017-2021 Households with a broadband Internet subscription 2017-2021 cluster_id Banned or not County Name
0 12703 73.2 118300 1159 403 593 25.0 10163 2.68 24760.26 495.76 23713.66 21014.55 0 0.0 Adams County, Ohio
1 44707 66.9 126900 1126 457 758 226.0 40671 2.42 88046.22 3151.77 92113.02 86724.51 0 1.0 Allen County, Ohio
2 22513 76.5 143100 1124 432 748 61.0 20531 2.46 45148.71 3191.28 46299.66 43317.65 0 0.0 Ashland County, Ohio
3 46355 71.6 121100 1086 426 735 168.0 38332 2.47 84877.86 6521.58 85948.57 80011.01 0 0.0 Ashtabula County, Ohio
4 26387 59.8 150800 1210 464 840 110.0 22381 2.41 43439.20 3226.91 56222.74 50079.19 0 0.0 Athens County, Ohio

All counties in each cluster¶

The funcationsAll.py file contains python code that imports various libraries for data analysis, visualization, and machine learning.

It defines a function called split_dataframe_by_cluster that takes a pandas DataFrame and the 'cluster_id' column as inputs and splits the DataFrame into multiple DataFrames based on the unique values in the 'cluster_id' column.

Another function called get_cluster_coords_dict is defined, which takes two inputs, a cluster DataFrame and the original DataFrame, and returns a dictionary of the coordinates points for all the counties in each cluster. These functions are used to create a list of coordinate dictionaries for each cluster in the data.

The code then defines three functions, cluster0, cluster1, and cluster2, that each takes a dictionary of cluster coordinates as an input and returns the coordinates for a specific cluster.

Finally, the code defines a function called clusterk_dict that takes a list of cluster DataFrames and a list of coordinate dictionaries as inputs and returns a dictionary where each key is a cluster number. The corresponding coordinate points represent the counties.

In [3]:
from functionsAll import split_dataframe_by_cluster, get_cluster_coords_dict,coords, clusterk_dict,\
cluster0, cluster1, cluster2
In [4]:
# Step 1
df_list = split_dataframe_by_cluster(cluster2_df, 'cluster_id')

# Step 2
coords_list = coords(df_list, cluster2_df)

# Step 3
clusterK_dict = clusterk_dict(df_list, coords_list)
In [5]:
for i in range(len(df_list)):
    if len(df_list) ==1:
        cluster0_dict = cluster0(clusterK_dict)
    elif len(df_list) ==2:
        cluster0_dict = cluster0(clusterK_dict)
        cluster1_dict = cluster1(clusterK_dict)
    elif len(df_list) ==3:
        cluster0_dict = cluster0(clusterK_dict)
        cluster1_dict = cluster1(clusterK_dict)
        cluster2_dict = cluster2(clusterK_dict)
        
    print(f"cluster{i}_dict have been made to dictionary")
        
        
cluster0_dict have been made to dictionary
cluster1_dict have been made to dictionary
cluster2_dict have been made to dictionary

Banned counties in each cluster¶

The funcationsBanned.py file contains python code that imports several libraries/modules for data analysis and visualization, machine learning, and geometry operations. The code defines several functions that take a DataFrame as input and perform operations to extract information about clusters of banned counties based on their coordinates.

The filter_banned_counties function filters out clusters of banned counties from a DataFrame based on their cluster ID and the value of a "Banned or not" column. It returns a list of DataFrames containing banned counties if there are more than two banned counties in a cluster or None if there are no banned counties.

The get_banned_cluster_coords_dict function creates a dictionary of coordinate pairs for each pair of columns in the banned counties DataFrame.

The bannedCoords function takes a list of banned county DataFrames and the original DataFrame and creates a list of dictionaries containing coordinate pairs for each pair of columns in the banned county DataFrame.

The clusterk_dict_banned function takes a list of banned county DataFrames and a list of dictionaries containing coordinate pairs. It creates a dictionary with keys representing each banned county cluster and values containing the respective coordinate pairs.

Finally, the cluster0Banned, cluster1Banned, and cluster2Banned functions return the coordinate pairs for the banned counties in clusters 0, 1, and 2, respectively, based on the input dictionary.

In [6]:
from functionsBanned import filter_banned_counties, get_banned_cluster_coords_dict, bannedCoords, clusterk_dict_banned,\
cluster0Banned, cluster1Banned, cluster2Banned
In [7]:
# 1
banned_counties_df = filter_banned_counties(cluster2_df)

# 2
bannedCoords_list = bannedCoords(banned_counties_df, cluster2_df)

# 3
clusterKBanned_dict = clusterk_dict_banned(banned_counties_df, bannedCoords_list)
Cluster0 had enough banned counties to find non-banned counties in the banned counties convex hull.
Cluster2 had enough banned counties to find non-banned counties in the banned counties convex hull.
In [8]:
for i in range(len(banned_counties_df)):
    if len(banned_counties_df) ==1:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
    elif len(banned_counties_df) ==2:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        cluster1_banned_dict = cluster1Banned(clusterKBanned_dict)
    elif len(banned_counties_df) ==3:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        cluster1_banned_dict = cluster1Banned(clusterKBanned_dict)
        cluster2_banned_dict = cluster2Banned(clusterKBanned_dict)
        
    print(f"cluster{i}_banned_dict have been made to dictionary")
        
cluster0_banned_dict have been made to dictionary
cluster1_banned_dict have been made to dictionary

Now to look at the counties inside the convex hull of the banned counties¶

The countyName.py file contains Python code is a Python script that imports various libraries/modules such as NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn to perform data analysis, data visualization, and machine learning tasks. The code contains several functions that take a Pandas DataFrame (df) and other input parameters as arguments and perform specific tasks.

The banned_counties_list function returns a list of county names that have been banned based on a column in the DataFrame called "Banned or not."

The countyNames_cluster0, countyNames_cluster1, and countyNames_cluster2 functions takes four inputs, the original data frame, a dictionary (clusterk_dict) that contains the x and y coordinates of the counties in each cluster and a dictionary (clusterk_banned_dict) that contains the x and y coordinates of the banned counties in each cluster. You may see that not all clusters will contain a banned county, and finaly a list of all the banned counties bannedCountiesList).

These three fucntions returns a dictionary where keys are tuples (pair of coordinates), and values are lists of county names that belong to a specific cluster. The function uses ConvexHull and mpath.Path methods find a specific cluster's boundary and return the county names inside that boundary. If there is a cluster with multiple banned counties, then the merge_dicts function merges two dictionaries (dict1 and dict2) and returns the merged dictionary.

In [9]:
from countyName import banned_counties_list, countyNames_cluster0, countyNames_cluster1, countyNames_cluster2, merge_dicts
In [10]:
bannedCountiesList=banned_counties_list(cluster2_df)


for i in range(len(banned_counties_df)):
    if len(banned_counties_df) ==1:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        allCounties=countyNames_cluster0(cluster2_df, cluster0_dict, cluster0_banned_dict, bannedCountiesList)

    elif len(banned_counties_df) ==2:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        cluster1_banned_dict = cluster1Banned(clusterKBanned_dict)
        
        countyName0=countyNames_cluster0(cluster2_df, cluster0_dict, cluster0_banned_dict, bannedCountiesList)
        countyName1=countyNames_cluster1(cluster2_df, cluster1_dict, cluster1_banned_dict, bannedCountiesList)
        
        allCounties = merge_dicts(countyName0, countyName1)

    elif len(banned_counties_df) ==3:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        cluster1_banned_dict = cluster1Banned(clusterKBanned_dict)
        cluster2_banned_dict = cluster2Banned(clusterKBanned_dict)
        
        countyName0=countyNames_cluster0(cluster2_df, cluster0_dict, cluster0_banned_dict, bannedCountiesList)
        countyName1=countyNames_cluster1(cluster2_df, cluster1_dict, cluster1_banned_dict, bannedCountiesList)
        countyName2=countyNames_cluster0(cluster2_df, cluster2_dict, cluster2_banned_dict, bannedCountiesList)
        
    print(f"cluster{i}_banned_dict have been made to dictionary")
        
cluster0_banned_dict have been made to dictionary
cluster1_banned_dict have been made to dictionary

Now that we have all the column pairs with non-banned counties within thier respected banned counties convex hull. We can now look at which variable pairs contain the most counties in the banned counties' convex hull.

Getting the top variable pairs, and there respected counties¶

The topVariableFunctions.py file contains several functions related to data analysis, visualization, and preprocessing for machine learning. Here is a brief explanation for each function:

filtered_var_pairs This function takes a dictionary called "allCounties" as input. It returns a list of key-value pairs where each key is a tuple containing two strings (category and subcategory), and each value is a list of counties. This function filters the original dictionary and extracts the relevant information for further analysis.

categoryCountyList takes the filtered list of key-value pairs as input and returns a dictionary where each key is a category string, and each value is a list of counties. The function also prints out the category with the most counties.

subcategoryCountyList takes the filtered list of key-value pairs as input and returns a dictionary where each key is a subcategory string, and each value is a list of counties. The function also prints out the subcategory with the most counties.

common keys This function takes two dictionaries as input and returns a new dictionary that contains the intersection of their keys and the union of their values.

freq_var This function takes a ranked list as input and returns a list of tuples where each tuple contains a county name and a list of variable names associated with that county. The ranked list is assumed to have a specific structure (with indices for county name, variable rank, variable name, and list of counties associated with the variable), which needs to be explained in the code snippet provided.

In [11]:
from topVariableFunctions import filtered_var_pairs, categoryCountyList, subcategoryCountyList, commonKeys, freq_var
In [12]:
filt = filtered_var_pairs(allCounties)
In [13]:
categorycountyList = categoryCountyList(filt)
The category with the most counties is 'Owner-occupied housing unit rate 2017-2021' with 27 counties.
In [14]:
subcategorycountyList=subcategoryCountyList(filt)
The subcategory with the most counties is 'Median selected monthly owner costs -with a mortgage 2017-2021' with 27 counties.
In [15]:
combinedVars = commonKeys(categorycountyList, subcategorycountyList)

After all variables are combined, we can now take a look at some of the counties that may be at risk the most common variable amoung them

Vizulaization using Common Counties and Frequent Variables¶

Table of Demographic varibales and a list counties¶

In [16]:
sorted_var_dict = dict(sorted(combinedVars.items(), key=lambda x: len(x[1]), reverse=True))
In [17]:
# Make ranking table of vars.
ranked_listTbl = [(i+1, item[0], len(item[1]), ", ".join([textwrap.fill(', '.join(textwrap.wrap(c, width=10)), width=80) for c in item[1]])) for i, item in enumerate(sorted_var_dict.items())]
topVarTbl = pd.DataFrame(ranked_listTbl)
topVarTbl.columns=['Rank', 'Demographic_variable', 'Number_of_counties', 'List_of_Counties']
In [18]:
# try to chnage lengh of cloumn
headerColor = 'grey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'


fig = go.Figure(data=[go.Table(
    columnwidth = [50,110,75,700],

    header=dict(
        values=['Rank', 'Demographic Variable', 'Number of Counties', 'List of Counties'],
                    line_color='darkslategray',
                    fill_color=headerColor,
                    align=['left','center'],
                    font=dict(color='white', size=12)
    ),
    cells=dict(
        values=[topVarTbl.Rank, topVarTbl.Demographic_variable, topVarTbl.Number_of_counties, topVarTbl.List_of_Counties],
               line_color='darkslategray',
    # 2-D list of colors for alternating rows
    fill_color = [[rowOddColor,rowEvenColor,rowOddColor, rowEvenColor,rowOddColor]*5],
    align = ['center', 'center', 'center', 'left'],
    font = dict(color = 'darkslategray', size = 11)
    ))
])

fig.show()

The table above gives a ranking for each demographic variable in their respective subset of data, the number of counties, and a list of the counties.

Plotting counties frequcy in each variable¶

Checking to see the values of each county that apeared in the table above.

In [19]:
ranked_list = [(i+1, item[0], len(item[1]), item[1]) for i, item in enumerate(sorted_var_dict.items())]
newdf2 = pd.DataFrame(ranked_list)
newdf2.columns=['Rank', 'Demographic_variable', 'Number_of_counties', 'List_of_Counties']

Transforming newdf2 to become tidy¶

In [20]:
# Separate the list of counties into individual rows
newdf2 = newdf2.explode('List_of_Counties')

# Drop the duplicate columns
newdf2 = newdf2.drop_duplicates(subset=['Rank', 'List_of_Counties'])

# Rename the columns for clarity
newdf3 = newdf2.rename(columns={
    'Demographic_variable': 'Demographic_Variable',
    'Number_of_counties': 'Number_of_Counties',
    'List_of_Counties': 'County'
})
In [21]:
# Melting OG data:
melted_df = pd.melt(cluster2_df, id_vars=['County Name'], var_name='Attribute', value_name='Value')
melted_df['County Name'] = melted_df['County Name'].apply(lambda x: x[:-13])
In [22]:
tidy_df = pd.merge(newdf3, melted_df[['County Name', 'Attribute', 'Value']], left_on=['County','Demographic_Variable'],
                   right_on=['County Name', 'Attribute'])
tidy_df.drop(['Attribute','County Name'], axis=1, inplace=True)

Frequency of Each County and there values among the most common variables found¶

In [23]:
# create bar chart trace
fig = px.bar(tidy_df, x=tidy_df.Demographic_Variable, y=tidy_df.Number_of_Counties, text='Value', color='County',
             labels={"Attribute": "Attribute",
                     "Value": "Attribute Value",
                     "Number_of_Counties":'Count of each County',
                     'Demographic_Variable': 'Top Housing Demographic Variables'
                 },
             title="Frequency of Each County With the Most Common Variables",)

fig.update_traces(textposition='inside')
#  update the layout to adjust the size of the plot
fig.update_layout(
    width=1200,  # set the width of the plot to 800 pixels
    height=1400,  # set the height of the plot to 600 pixels
)

# display the plot
fig.show()

Table of Counties and a list of Housing Demographic Variables¶

In [24]:
freq_var =freq_var(ranked_list)
In [25]:
# var_freq to dictionary
var_freq_dict = {item[0]: item[1] for i, item in enumerate(freq_var)}
In [26]:
# sorting var_freq_dict dictionary
sorted_var_freq_dict = dict(sorted(var_freq_dict.items(), key=lambda x: len(x[1]), reverse=True))
In [27]:
# Make ranking table of vars.
ranked_var_freqViz = [(i+1, item[0], len(item[1]), ", ".join([textwrap.fill(' '.join(textwrap.wrap(c, width=10)), width=80) for c in item[1]])) for i, item in enumerate(sorted_var_freq_dict.items())]
top_var_freqdfViz = pd.DataFrame(ranked_var_freqViz)
top_var_freqdfViz.columns=['Rank', 'County', 'Number_of_variables', 'List_of_variables']
In [28]:
headerColor = 'grey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'


fig = go.Figure(data=[go.Table(
    columnwidth = [40,70,100,600],

    header=dict(
        values=['Rank', 'County Name', 'Number of Demographic Variables', 'List of Demographics'],
                    line_color='darkslategray',
                    fill_color=headerColor,
                    align=['center','center', 'center', 'left'],
                    font=dict(color='white', size=12)
    ),
    cells=dict(
        values=[top_var_freqdfViz.Rank, top_var_freqdfViz.County, top_var_freqdfViz.Number_of_variables,
                top_var_freqdfViz.List_of_variables],
               line_color='darkslategray',
    # 2-D list of colors for alternating rows
    fill_color = [[rowOddColor,rowEvenColor,rowOddColor, rowEvenColor,rowOddColor]*5],
    align = ['center', 'center', 'center', 'left'],
    font = dict(color = 'darkslategray', size = 11)
    ))
])

fig.show()

Tidying for plot¶

In [29]:
# # Make ranking table of vars.
ranked_vars_list = [(i+1, item[0], len(item[1]), item[1]) for i, item in enumerate(sorted_var_freq_dict.items())]
top_var_freqdf = pd.DataFrame(ranked_vars_list)
top_var_freqdf.columns=['Rank', 'County', 'Number_of_variables', 'List_of_variables']
In [30]:
# Separate the list of counties into individual rows
test2 = top_var_freqdf.explode('List_of_variables')

# Drop the duplicate columns
test2 = test2.drop_duplicates(subset=['Rank', 'List_of_variables'])

# Rename the columns for clarity
test5 = test2.rename(columns={
    'List_of_variables': 'Demographic_Variable',
    'Number_of_variables': 'Number_of_variables',
    'County': 'County'
})

# Display the updated data frame
# test5.head()
In [31]:
tidy_df2 = pd.merge(test5, melted_df[['County Name', 'Attribute', 'Value']], left_on=['County','Demographic_Variable'],
                   right_on=['County Name', 'Attribute'])

Frequency of Each Housing Demographic Variables and there values among the most common Counties found¶

In [32]:
# create bar chart trace
fig = px.bar(tidy_df2, x=tidy_df2.County, color='Demographic_Variable',text='Value',
             labels={"Attribute": "Attribute",
                     "Value": "Attribute Value",
                     "Demographic_Variable":"Demographic Variables"
                 },
             title="Frequency of Each Housing Demographic Variables With the Most Common Counties",)

fig.update_traces(textposition='inside')
# update the layout to adjust the size of the plot
fig.update_layout(
    width=1500,  # set the width of the plot to 800 pixels
    height=500,  # set the height of the plot to 600 pixels
)

# display the plot
fig.show()

Most frequent Demographic variables¶

In [33]:
## Most common variables
res = sum(sorted_var_freq_dict.values(), [])
mostCommonVar = list(set(res))
mostCommonVar
Out[33]:
['Owner-occupied housing unit rate 2017-2021',
 'Living in same house 1 year ago of persons age 1 year+ 2017-2021',
 'Building permits 2021',
 'Households 2017-2021',
 'Households with a computer 2017-2021',
 'Persons per household 2017-2021',
 'Language other than English spoken at home of persons age 5 years+ 2017-2021',
 'Median selected monthly owner costs -without a mortgage 2017-2021',
 'Median value of owner-occupied housing units 2017-2021',
 'Median gross rent 2017-2021',
 'Median selected monthly owner costs -with a mortgage 2017-2021']

Most Frequent Counties¶

In [34]:
mostCommonCounties = top_var_freqdfViz['County'].tolist()
print(mostCommonCounties)
['Fairfield', 'Sandusky', 'Huron', 'Ashland', 'Erie', 'Darke', 'Tuscarawas', 'Marion', 'Washington', 'Jefferson', 'Belmont', 'Brown', 'Athens', 'Clark', 'Lake', 'Ashtabula', 'Clinton', 'Lawrence', 'Guernsey', 'Hardin', 'Ottawa', 'Ross', 'Scioto', 'Greene', 'Licking', 'Mahoning', 'Preble', 'Fayette', 'Trumbull', 'Paulding', 'Putnam', 'Van Wert', 'Henry', 'Wayne', 'Shelby', 'Muskingum', 'Richland', 'Perry', 'Wyandot', 'Hocking', 'Champaign', 'Fulton', 'Highland', 'Williams', 'Geauga', 'Noble', 'Mercer', 'Vinton', 'Defiance', 'Carroll', 'Gallia', 'Miami', 'Warren', 'Clermont']

Plotting the Counties at Risk on a Map of Ohio¶

Making Excel file with the most frequent variables for Population Demographic Data¶

In [35]:
counties_ohio = [county + ' County, Ohio' for county in mostCommonCounties]
In [36]:
y = cluster2_df[cluster2_df['County Name'].isin(counties_ohio)]
mostFreqcountyVarTbl = y.iloc[:,:-3]
mostFreqcountyVarTbl['County'] = y.iloc[:, -1]
mostFreqcountyVarTbl.head()
mostFreqcountyVarTbl.to_excel('mostFreqHousingData.xlsx')
In [37]:
# Creating new data frame with only the most common variable
mapPlotdf = cluster2_df[combinedVars]
In [38]:
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/minoritymajority.csv',
                   dtype={"fips": str})

# Filter the data frame to include only Ohio
df_ohio = df[df['STNAME'] == 'Ohio']

# Filter the counties GeoJSON file to include only Ohio counties
counties_ohio = {'type': 'FeatureCollection', 'features': []}
for feature in counties['features']:
    if feature['id'][:2] == '39':
        counties_ohio['features'].append(feature)
In [39]:
# Make new data frame with the most common vars
mapPlotdf['TempCounty Name'] = cluster2_df['County Name'].apply(lambda x: x[:-6])
In [40]:
# merge the two data frames based on the 'County Name' and 'CTYNAME' columns
merged_df = pd.merge(mapPlotdf, df_ohio[['CTYNAME', 'FIPS']], left_on='TempCounty Name', right_on='CTYNAME')

# drop the duplicate 'CTYNAME' column
merged_df.drop('CTYNAME', axis=1, inplace=True)
merged_df.drop('TempCounty Name', axis=1, inplace=True)
In [41]:
merged_df['TempCounty Name'] = cluster2_df['County Name'].apply(lambda x: x[:-13])
In [42]:
# Example list of counties to check
counties_to_check = merged_df['TempCounty Name'].tolist()

# Create a list to hold safe counties
safe_counties = []

# Check each county and append to the safe_counties list if not in either of the two lists
for county in counties_to_check:
    if county not in mostCommonCounties and county not in bannedCountiesList:
        safe_counties.append(county)

# Print the list of safe counties
print(safe_counties)
['Adams', 'Columbiana', 'Coshocton', 'Cuyahoga', 'Delaware', 'Franklin', 'Hamilton', 'Harrison', 'Holmes', 'Jackson', 'Lorain', 'Lucas', 'Madison', 'Meigs', 'Monroe', 'Montgomery', 'Morgan', 'Morrow', 'Pickaway', 'Pike', 'Portage', 'Stark', 'Wood', 'Summit']
In [43]:
merged_df['risk'] = merged_df['TempCounty Name'].apply(lambda x: 'At risk' if x in mostCommonCounties else
                                                      'Already has Banned' if x in bannedCountiesList else
                                                      'Safe' if x in safe_counties else merged_df[merged_df['TempCounty Name'] == x]['risk'].values[0])
In [44]:
merged_df
Out[44]:
Owner-occupied housing unit rate 2017-2021 Living in same house 1 year ago of persons age 1 year+ 2017-2021 Building permits 2021 Households 2017-2021 Households with a computer 2017-2021 Persons per household 2017-2021 Language other than English spoken at home of persons age 5 years+ 2017-2021 Median selected monthly owner costs -without a mortgage 2017-2021 Median value of owner-occupied housing units 2017-2021 Median gross rent 2017-2021 Median selected monthly owner costs -with a mortgage 2017-2021 FIPS TempCounty Name risk
0 73.2 24760.26 25.0 10163 23713.66 2.68 495.76 403 118300 593 1159 39001 Adams Safe
1 66.9 88046.22 226.0 40671 92113.02 2.42 3151.77 457 126900 758 1126 39003 Allen Already has Banned
2 76.5 45148.71 61.0 20531 46299.66 2.46 3191.28 432 143100 748 1124 39005 Ashland At risk
3 71.6 84877.86 168.0 38332 85948.57 2.47 6521.58 426 121100 735 1086 39007 Ashtabula At risk
4 59.8 43439.20 110.0 22381 56222.74 2.41 3226.91 464 150800 840 1210 39009 Athens At risk
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
83 64.2 106375.02 547.0 53010 126113.34 2.37 6093.71 553 177000 845 1437 39173 Wood Safe
84 81.2 55095.41 1253.0 21654 62177.25 2.71 4612.94 626 242400 1050 1696 39159 Union Already has Banned
85 72.5 20123.32 70.0 9036 19884.53 2.40 607.82 420 133600 682 1139 39175 Wyandot At risk
86 66.8 472041.77 788.0 228283 497310.52 2.33 37634.31 525 159800 882 1299 39153 Summit Safe
87 78.0 25887.53 37.0 11553 26088.66 2.47 689.57 405 112300 721 957 39161 Van Wert At risk

88 rows × 14 columns

In [45]:
fig = px.choropleth(merged_df, geojson=counties_ohio, locations='FIPS', color='risk',
                    color_continuous_scale="Viridis",
                    range_color=(0, 2),
                    scope="usa",
                    hover_data=["TempCounty Name", 'Median selected monthly owner costs -without a mortgage 2017-2021',
 'Median selected monthly owner costs -with a mortgage 2017-2021', 'Median gross rent 2017-2021',
],
                    labels={'risk':'Risk Level',
                           'TempCounty Name': 'County Name',
                           'Median selected monthly owner costs -without a mortgage 2017-2021':'Median Monthly Owner Cost (Without Mortgage)',
                           'Median selected monthly owner costs -wit a mortgage 2017-2021': 'Median Monthly Owner Cost (With Mortage)'},
                                       color_discrete_map={'Already has Banned': 'red',
                                                           'At risk': 'blue', 'Safe': 'green'})

                          
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})


fig.show()